Search CORE

8 research outputs found

3D Convolutional Neural Networks for Computational Drug Discovery

Author: Sunseri Jocelyn
Publication venue
Publication date: 06/01/2021
Field of study

This thesis describes aspects of the implementation and application of voxel-based con- volutional neural networks (CNNs) to problems in computational drug discovery. It opens by justifying the novelty of this approach by presenting a more mainstream approach to the common tasks of virtual screening and binding pose prediction, augmented with more sim- plistic machine learning methods, and demonstrating their suboptimal performance when applied prospectively. It then describes my contributions to our group’s development of voxel-based CNNs as we honed their implementation and training strategy, and reports our library that facilitates featurization and training using this approach. It continues with a prospective assessment of their performance, analogous to the first prospective evaluation, with the addition of a novel CNN-based pose sampling strategy. Next it makes a foray into model explanation, first in an oblique fashion, by examining the transferability of models to tasks that are distinct from but related to the tasks for which they were trained, and by a comparison with an approach based on exploiting dataset bias using other machine learning methods. Finally it describes the implementation of a more direct approach to model ex- planation, by using a trained network to perform optimization of inputs with respect to the network as a whole or individual nodes and analyzing the content of the result as well as its utility as a pseudo-pharmacophore

D-Scholarship@Pitt

Protein-Ligand Scoring with Convolutional Neural Networks

Author: Hochuli Joshua
Idrobo Elisa
Koes David Ryan
Ragoza Matthew
Sunseri Jocelyn
Publication venue
Publication date: 08/12/2016
Field of study

Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive 3D representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and non-binders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening

arXiv.org e-Print Archive

FigShare

Virtual Screening with Gnina 1.0

Author: David Ryan Koes
Jocelyn Sunseri
Publication venue: 'MDPI AG'
Publication date: 01/12/2021
Field of study

Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Pharmit: interactive exploration of chemical space

Author: David Ryan Koes
Gaulton
Jocelyn Sunseri
Kim
Rego
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

GNINA 1.0: molecular docking with deep learning

Author: Andrew T. McNutt
David Ryan Koes
Jocelyn Sunseri
Matthew Ragoza
Paul Francoeur
Rishal Aggarwal
Rocco Meli
Tomohide Masuda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2021
Field of study

Abstract Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. Gnina, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of Gnina under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina

Directory of Open Access Journals

Improved understanding of aqueous solubility modeling through topological data analysis

Author: A Llinàs
A Lusci
a Mauri
Bo Wang
Carlos R. García-Alonso
DS Palmer
Francisco Belchi Guillamon
H Adams
J Huuskonen
J Wang
J-L Reymond
Jacek Brodzki
Jarmo Huuskonen
Jeremy G. Frey
Jocelyn Sunseri
JS Delaney
K Xia
K Xia
L Duponchel
Lee Steinberg
M Hewitt
M. Landsberg
Mahesan Niranjan
Mariam Pirashvili
NM O’Boyle
Peter Bubenik
PY Lum
Sheila Ash
T Ichinomiya
T Kennedy
T Miyao
T Nakamura
WL Jorgensen
Y Ran
Y Yao
Yasuaki Hiraoka
Yongjin Lee
Z Cang
Z Cang
Z Cang
Zixuan Cang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A D3R prospective evaluation of machine learning for protein-ligand scoring

Author: A Cherkasov
A Lusci
A Patrícia Bento
B Chen
BR Brooks
C Kramer
C McInnes
D Rogers
D Zilian
DA Case
David Ryan Koes
DB Kitchen
DR Koes
E Harder
E Lindahl
F Pedregosa
G Jones
G Schneider
GL Warren
H Gohlke
H Zhou
HJ Böhm
HM Ashtawy
I Muegge
J Gabel
J-H Hsieh
Jasmine Collins
JD Durrant
JD Durrant
JD Durrant
Jocelyn Sunseri
L Tan
Matthew Ragoza
MD Eldridge
MM Mysinger
NM O’Boyle
O Korb
O Trott
PJ Ballester
PS Charifson
R Raúl
R Wang
R Wang
R Wang
RA Friesner
RD Smith
RL DesJarlais
RN Jorissen
RS DeWitte
S Yin
S-Y Huang
SY Huang
SY Huang
T Cheng
T Sato
TJ Ewing
V Chupakhin
W Deng
WL Jorgensen
WT Mooij
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Open source molecular modeling

Author: Abreu
Allouche
Amani
Andrade
Bahn
Bakan
Bakan
Baker
Ballante
Ballester
Beauchamp
Beisken
Berenger
Berthold
Biasini
Biasini
Bienfait
Blöchl
Bode
Borgelt
Brefo-Mensah
Bruns
Brylinski
Bullock
Burger
Caboche
Canepa
Cao
Carbonell
Carrió
Cereto-Massague
Cereto-Massagué
Chéron
Cickovski
Cortes-Ciriano
Dalke
Dallakyan
David Ryan Koes
Demšar
Dolinsky
Drefahl
Dupradeau
Durrant
Durrant
Durrant
Durrant
Earley
Eastman
Ebejer
Ellingson
Enkovaara
Ertl
Etienne
Filippov
Gasteiger
Genovese
Gezelter
Giannozzi
Goecks
Gonze
Guha
Guha
Guilloux
Guixà-González
Gütlein
Gütlein
Haider
Hall
Hanson
Hanson
Hanwell
Haque
Hermann
Hildebrandt
Hinsen
Hoksza
Huey
Hutter
Höck
Jacob
Jahn
Janssen
Jeliazkova
Jiang
Jocelyn Sunseri
Kim
Kochev
Koes
Koes
Kovačević
Krause
Kresse
Krylov
Kuhn
Lawson
Leach
Lehtola
Lewis
Li
Li
Lorenzen
Lotrich
Lowe
Lusci
Lyubartsev
Marcos-Alcalde
Marques
Martínez
Martínez
Matthey
McGibbon
Meier
Melville
Michaud-Agrawal
Michel
Miteva
Mohebifar
Mohr
Mohr
Moll
Moore
Morris
Murrell
Nikolaienko
Norrby
Oliveira
Ozaki
O’Boyle
O’Boyle
O’Boyle
O’Boyle
O’boyle
O’Boyle
Patlewicz
Pavlov
Peironcely
Plimpton
Pronk
Rahman
Rego
Rijnbeek
Romo
Rose
Rosen
Rudberg
Ruiz-Carmona
Rydberg
Rydberg
Salentin
Salomon-Ferrer
Sander
Scherer
Schmidtke
Schreyer
Smith
Smith
Somayeh Pirhadi
Spjuth
Steinbeck
Steinmann
Stålring
Sud
Sunseri
Supady
Sweeney
Taminau
Tarini
Till
Tosco
Tosco
Tribello
Trott
Turney
Valiev
Vanommeslaeghe
Villoutreix
Vitalis
Vivo
Wang
Weisel
Wetzel
Wolstencroft
Wójcikowski
Wójcikowski
Yap
Yesylevskyy
Zhang
Zhu
Zonta
Zwier
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref